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Summary of Research Activities and Findings 


1. Helicopter Gearbox Anomaly Detection 

The initial aim of this project was to provide machine learning support for failure 
prediction in helicopter gearboxes. We sought to develop anomaly, or outlier, detectors 
based on accelerometer measurements of gearbox vibration. Due to the large variability 
of vibration signatures with the aircraft’s dynamical state (e.g. maneuvers), we 
recognized early that useful outlier detection would require knowledge of the state to 
allow conditioning. 

Our initial studies were aimed at using features derived from vibration total RMS power 
and vibration spectra to identify — via (unsupervised) clustering - aircraft maneuvers. 

The gearbox data for these studies consisted of the instantaneous signal from six 
accelerometers time- synchronously averaged at three different periods: the pinion, bevel, 
and rotor periods. The raw data is thus an 18 -dimensional time series. These signals 
were available for 14 maneuvers, which we clustered into 9 classes based on symmetries. 

RMS Power - RMS power was calculated from the entire time series (~34 sec) at each 
maneuver. To enhance clustering and aid visualization, we applied both PC A and 
discriminatory feature selection to reduce the signal dimension from 18 to 7. 

Several clustering techniques, with cross-validation used to determine number of clusters, 
were applied. Gaussian mixture models severely underestimate the number of clusters 
(typically 3), yielding a poor discrimination between the maneuvers (37% classification 
rate). Entropy-constrained k-means (standard k-means with a regularizer consisting of 



and entropy penalty to encourage small models) produces good classification (89% 
classification rate) but grossly overestimates the number of clusters (typically ~28). The 
k-means classifier accuracy is comparable to results obtained by NASA ARC scientists 
on the same data using a (supervised) neural network classifier. Entropy-constrained 
adaptive PCA typically gives 6 clusters and a 65% classification rate. The local 
dimension of the clusters range from 0 to 5. 

Spectral Features - Spectra carry more detailed information than the total RMS power, 
and are expected to be important in anomaly detection. Auto-regressive spectral 
estimates failed to discriminate between maneuvers, so we turned to Welch-averaged 
estimates. We explored an extensive range of averaging windows reflecting a wide 
coverage of the bias-variance tradeoff in the spectral estimates. We further explored 
several techniques for concatenating and normalizing the spectra from the six 
accelerometers. Results indicated that with properly-chosen Welch-averaging and 
concatenation, unsupervised maneuver classification comparable to, but not better than, 
that resulting from RMS power features is obtained. Clustering based on the FFT of the 
time-synchronous average accelerometer traces did not perform as well as the best 
Welch-averaged spectra. 

Nonstationarity - Marianne Mosher at NASA ARC determined that accelerometer 
signals are not stationary over the 34-second period used in the time-synchronous 
averages. She suggested timescales over which the signals are stationary. We found that 
the suggested short-time averaged spectra are less-easily clustered by maneuver. That is, 
maneuvers overlap more in this representation. Presumably, the shorter time averages 
contribute to noisy spectra, and the required Welch-averaging smoothes over 
discriminatory information. 

The nonstationarity results suggested that maneuver are an insufficient specification of 
dynamical state. More refined indicators are required. Flight-bus data could provide the 
fine-scale dynamical state information required to understand the relationship between 
flight-state and vibration-signature during nonstationary flight. Based on this, and earlier 
results, in October 02, we requested pooled vibration and flight-bus data. These data 
were not available until late in March 03, by which time we had redirected our research 
thrust to remote earth observing data. 


2. Application of Complexity-Penalized Clustering to Segmentation of EOS Data 

In collaboration with Ashok Srivastava ar ARC, in early 2003 we began investigating the 
use of novel clustering techniques for exploration and segmentation of multi-channel 
imaging spectrometer data from NASA Earth Observing satellites. Dr. Srivastava had 
been using kernel-based clustering for segmentation of EOS images. His initial 
exploration on an image of Greenland turned up an unexpected identification of a 
possible ice-melt region. 



The results of clustering algorithms are sensitive to initial conditions, and Dr. Srivastava 
voiced an interest in obtaining low-variability alternatives to the algorithms he has been 
using. The entropy-penalized clustering algorithms we had been exploring with 
helicopter data have a natural mechanism for suppressing variability, and like the kernel 
methods, have more flexible modeling capability than standard approaches. This led to 
our collaboration on these problems. 

Our initial studies explored application of several entropy-constrained algorithms to 
portions of multi-channel spectrometer images of Sicily and of Greenland. We 
reproduced Dr. Srivastava’s segmentation with slight differences in the boundary. 

We found that an entropy-constrained k-means algorithm provides lower variability with 
respect to initial conditions than does unconstrained k-means, or our adaptive PCA 
algorithms. We have not yet compared our variability results with Dr. Srivastava’s, 
though we find very robust replication of the segmentation feature he discovered, albeit 
with small variations of the boundary. 

We explored the use of a genetic algorithm clustering to reduce variability. Our study 
showed that the computational complexity is unfavorable with respect to simple 
clustering with multiple restarts. 

Finally, and most productively, we explored incorporating hints to help constrain 
clustering. These hints consist of human- induced biases that encourage, or discourage, 
co-clustering of a small number of pairs of datapoints. This is a form of prior knowledge 
that is weaker than class labels. Our resulting algorithm is a probabilistic clustering 
model (mixture model) that successfully generalizes the information in the hints to out- 
of-sample data. 

This algorithmic development, and its application to the Greenland image data was 
published in NIPS 17 (see publications). We are also drafting a journal article on this 
material for submission in June 2005. 


Educational Activity 

This award supported a portion of the doctoral studies of Cynthia Archer. She received 
her Ph.D. degree in June, 2002. Dr. Archer is now employed at the Portland, OR office 
of Research Triangle Park. 

This award supported a portion of the doctoral studies of Zhengdong Lu. Zhengdong is 
currently a Ph.D. student in the Pis lab. 

The award also funded research activities of a postdoctoral research student, Dr. Alex 
Nelson, who worked with us on the helicopter gearbox data during the fall and early 
winter of 2002, and also working on preliminary aspects of the segmentation of EOS 
data. Dr. Nelson is now employed in biomedical signal processing at Inovise. 



Publications 


Zhengdong Lu and Todd K. Leen. Semi-supervised Learning with Penalized 
Probabilistic Clustering. In Advances in Neural Information Processing Systems 1 7, 
Saul, Weiss, and Bottou (eds), The MIT Press, 2005. 

Zhengdong Lu and Todd K. Leen. Prior Knowledge for Probabilistic Clustering. In 
preparation for submission to Neural Computation. The target submission date is June 
13,2005. 


Patent Activity - None 


Ancillary Materials 

Presentations from the IS PI workshops are appended below. 



Building Better Clusters 


Unsupervised Classification 
for Novelty Detection 

Towards Application to Failure Prediction 

Sept 4, 2002 


Cynthia Archer, Lu Zhengdong, 
Todd Leen 



Todd K. Leen 
OGI - OHSU Sept. 4, 2002 



Motivation and Algorithm Grounding 

Outlier detection to identify anomalies 
Accurate models of healthy baseline 

- “healthy” must be conditioned on operating state - mixture 
or local models for nonstationarity 


h 


Healthy or faulty? 

* 


Healthy distribution, 
operating state B 


Healthy distribution, 
operating state A 
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Todd K. Leen 
OGI - OHSU Sept. 4, 2002 




Clustering Approaches 



• Clustering Gaussian Mixture Density Models 

- How many clusters? 

- What shape (constraints of mixture components)? 

- Dimensionality for PCA-based clustering? 


e.g. Helicopter gearbox RMS vibration signal from 
6 accelerometers in 14 different maneuvers 




Todd K. Leen 
OGI - OHSU Sept. 4, 2002 





New Algorithm 

Entropy-Constrained Adaptive PCA 



• Clustering based on constrained Gaussian mixture model. 
Constraints related to PCA and factor analysis (Basilevsky, 
Tipping and Bishop) 

• Structure includes model resolution parameter (or 
observation noise variance) o 2 

Formalism leads to entropy-penalized (regularized) cost 
function directly from likelihood maximization. 

• Locally adjusts cluster dimensionality and shape to data. 

• Includes unconstrained mixture models and entropy- 
penalized k-means as special cases. 

• Number of clusters selected by cost minimization on holdout 
set. 


Makes inspired choices for number of clusters. Functions 
well for unsupervised classification. 

Todd K. Leen 
OGI - OHSU Sept. 4, 2002 





What Else Does it Do? 



• High-D example that can be visualized - unsupervised texture 
segmentation 




Texture Segmentation 



Number of clusters chosen to minimize corresponding 
clustering cost - not to optimize texture segmentation 
performance. 


Entropy-constrained K-means Standard Gaussian mixture Entropy-constrained APCA 
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Gearbox Vibration 


• 14 maneuvers, human-clustered into 9 classes 

• Features - RMS power in each of 6 accelerometers from 3 
different synchronous averaging periods, 18-dim space, pruned 
to 7 based on discriminative ability 

• Clustering via entropy-constrained k-means, standard Gaussian 
mixtures, and entropy-constrained APCA. Evaluate clusters as 
classifier. 

• Results 

- Unconstrained Gaussian mixtures severely underestimate 
number of clusters (3), poor discrimination between real 
classes (37% classification rate). 

- Entropy-constrained k-means produces good classification 
(89%) by grossly overestimating number of clusters (28) 

- Entropy-constrained APCA likes 6 clusters, gives 65% 
classification rate, cluster dimensions from 0 to 5. 

Todd K. Leen 

9 OGI - OHSU Sept. 4, 2002 






Outstanding Issues 



• Choosing model resolution cj 2 via cross-validation. 
Seems to consistently underestimate - estimation bias? 

• How to do feature selection for clustering? 

• Figure-of-merit for cluster-based unsupervised 
classifiers? 

• How to do real-time operating state conditioning for 
helicopter data. Operating state - quantized or 
continuous? 

• What about real texture segmentation? 


• Applications to other environmental science datasets? 
Dynamical regime identification by clustering? 



New Clustering Framework 

Clustering based on constrained Gaussian mixture models 



- Latent variable generative model constraint structure 
related to PCA / FA 


- Automatically tunes to local data dimensionality 

- Generates entropy-penalized (e.g. regularized) cost function 
directly from likelihood maximization 

- Automatic selection of number of clusters by likelihood 
maximization on holdout data. 


- Appears to work well for unsupervised classification. 



Latent space s 

Maps W, from s to data space x (fit) 
Additive noise - variance a 2 
(resolution control parameter, not fit) 
Rank(l/1/;) determined by data & ct 2 
sets local cluster dimension 

Todd K. Leen 
OGI - OHSU Sept. 4, 2002 
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Mixture-PCA Density Model 



• Density model p(x) = 2 n a p(x|a), 

with 


p(x|a) = N(p a , a 2 1 + W a W a T ) 



sets model 
resolution 



Defines orientation and eigenvalues 
for local PCA subspaces 



• Soft-clustering through posterior p(a/x) 

• Hard-clustering limit of data likelihood leads to a cost function 
for entropy-constrained clustering - entropy-constrained 
adaptive PCA (EC-APCA). 

Todd K. Leen 

3 OGI - OHSU Feb. 4, 2004 



Entropy-Constrained Clustering 


• Automatically tunes to local data dimensionality 

• Generates entropy-penalized (e.g. regularized) cost function 
directly from likelihood maximization 

• Automatic selection of number of clusters by x-validation 

• Includes unconstrained mixture models and entropy- 
penalized k-means as special cases. 

• Number of clusters selected by x-validation. 

• Selection of model resolution parameter o 2 by x-validation 
(with variable results). 


Todd K. Leen 
OGI -OHSU Feb. 4, 2004 







Application to Helicopter 
Gearbox Vibration 



• Classify maneuver (flight “state”) from vibration information. 
Surrogate task for fault detection. 

• Features - RMS power in each of 6 accelerometers from 3 different 
synchronous averaging periods, 18-dim space, pruned to 7 based on 
discriminative ability 

- Clustering via entropy-constrained k-means, standard Gaussian 
mixtures, and entropy-constrained APCA. Evaluate clusters as 
classifier. (Classification results ~ comparable to supervised 
learning.) 

• Features - Welch power spectra of time-synchronous averaged (TSA) 
time series. 


- Normalize spectra to unit power, concatenate spectra from several 
gear TSA. Marginally less accurate than clustering via RMS power. 

Long-term (~34 sec) TSA noted (Huff / Mosher, NASA Ames) to be non- 
stationary. But clustering over short-term TSA provides poor 
maneuver classification. Suggests need for more detailed state- 
description than maneuver only. 


Todd K. Leen 
OGI-OHSU Feb. 4, 2004 




Image Segmentation 



• Success with texture segmentation (reported last year) 
suggested application to image segmentation of earth-observing 
data. 

• Unlabeled image data - how to evaluate unsupervised 
segmentation? 

Compare with human clustering ... 

~ 68% agreement with 2 different human clusterings. 

Agreement between humans is about 70% 





Clustering Image Blocks 


Led to partially-supervised mixture-based clustering 

- Gaussian mixture model for data density / clustering 

- Incorporate pairwise “opinions” into prior on assignment 
of image blocks to mixture components 

(clusters). 

- “Penalized Probabilistic Clustering” (PPC) 


Todd K. Leen 
OGI-OHSU Feb. 4, 2004 




Satellite Image Data 




Partially-labeled region 
- Labeled into 2 class-sets 

• Snow area: wet snow, dry 
snow, melt ponds, bare ice 

• Non-snow area: water, 
clouds, bare land 


Todd K. Leen 
OGI-OHSU Feb. 4, 2004 




50% data for training and 50% data 
for test 


Classification accuracy are 
averaged over 20 runs 

Effect of constraints in training 
properly generalizes to test set 




75 . 05 % 


99 . 23 % 


97 . 15 % 


Todd K. Leen 
OGI-OHSU Feb. 4, 2004 
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Clustering Framework 

Clustering based on constrained Gaussian mixture models 



- Latent variable generative model constraint structure 
related to PCA / FA 


- Automatically tunes to local data dimensionality 

- Generates entropy-penalized (e.g. regularized) cost function 
directly from likelihood maximization 

- Automatic selection of number of clusters by x-validation 



Latent space s 

Maps W, from s to data space x (fit) 
Additive noise - variance a 2 
(resolution control parameter, not fit) 
Rank(l/I/;) determined by data & ct 2 
sets local cluster dimension 

Todd K. Leen 
OGI-OHSU Feb. 4, 2004 
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Features for Clustering 



• Welch-averaged power spectra of the TSA data, 

- select the appropriate FFT length using a qualitative 
bias/variance tradeoff. 

- combined the spectra of 6 accelerometers into a single 
feature vector. 


- Three different methods where investigated for this 
combination: 



• Concatenation without scaling. 

- preserves frequency information, relative power between 
channels, total power 

• Concatenation followed by Normalization to unit vector 
magnitude. 

- preserves frequency information, relative power between 
channels, not total power 

• Normalization followed by Concatenation. 

- preserves frequency information only, no power information 
retained 


Todd K. Leen 
OGI-OHSU Feb. 4, 2004 
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Combining Features 



• Normalize-Concatenate gave superior clustering accuracy 
for all three gear TSAs, and for both APCA and ECVQ. 
However, 

- normalization removes information about relative RMS 
power between accelerometers, as well as removing 
RMS differences between examples. 

- Cynthia reported 89% accuracy using RMS features from 
the 3 gear-TSAs and six accelerometers 

• Handpicked 7 features using all gears and 
accelerometers. 

* So we do best by discarding RMS, even though Cynthia 
found it to be a useful feature! 
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